Adapting SProUT to processing Baltic and Slavonic languages
نویسندگان
چکیده
This paper focuses on presenting an initial effort for porting SProUT — a novel general purpose IE platform, to processing Baltic and Slavonic languages. We describe the system, characterize the mentioned language groups and discuss the process of developing named-entity and chunk grammars for these languages, which are crucial for solving information extraction tasks.
منابع مشابه
Rule-based Named-Entity Recognition for Polish
Although considerable work on namedentity recognition for English and few other major languages exists, research on this topic with regard to Slavonic languages has been almost neglected. In this paper, we present an attempt towards constructing a named-entity recognition system for Polish on top of SProUT, a novel multi-lingual NLP platform, we discuss the encountered difficulties, and present...
متن کاملSemi-automatic Approach to Building Dictionary between Slavonic Languages
Machine translation between Slavonic languages is still in its early stages. Existence of bilingual dictionaries have big impact on quality of translation. Unfortunately creating such language resources is quite expensive. For small languages like Czech, Slovak or Slovenian is almost sure that large-enough dictionary will not be commercially successful. Slavonic languages tends to range between...
متن کاملGender in Slavonic from the Standpoint of a General Typology of Gender Systems
THIS paper outlines a general typology of gender systems and locates the Slavonic systems within it. There are two reasons for adopting this approach: first, it gives a new perspective on the Slavonic data; and second, it highlights those features of gender in Slavonic which are of most interest to researchers working in general linguistics. Slavonic is indeed a rich source: its gender systems ...
متن کاملDevelopment of multi-voice and multi-language TTS synthesizer (languages: Belarussian, Polish, Russian)
The paper describes some results of the research which aiming at filling the gap in introducing and promoting computerized speech technology for Slavonic languages, in particular, a technology of TTS synthesis for Belarusian, Polish and Russian. A typological analysis of the peculiarities of phonemic and allophonic systems of Belarussian, Polish and Russian languages is given. Based on the resu...
متن کاملTowards Partial Word Sense Disambiguation Tools for Czech
Complex applications in natural language processing such as syntactic analysis, semantic annotation, machine translation and especially word sense disambiguation consist of several relatively simple independent tasks. Czech, belonging among Slavonic languages with many inflectional features, requires more effort for such tasks, in comparison with other languages. In this article we present two ...
متن کامل